Model Selection

Multi-round Visual Dialogue

# Multi-round Visual Dialogue

Internlm Xcomposer2 4khd 7b

InternLM-XComposer2-4KHD is a general visual language large model based on InternLM2, with the ability to understand 4K resolution images.

Cogagent Vqa Hf

CogAgent is an open-source vision-language model based on CogVLM, focusing on single-round visual question answering tasks

Transformers English

Cogagent Chat Hf

CogAgent is an open-source vision-language model based on CogVLM improvements, featuring GUI agent capabilities, multi-round visual dialogue, and visual grounding.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase